Goto

Collaborating Authors

 transfer 0



Topological Data Analysis for Unsupervised Anomaly Detection and Customer Segmentation on Banking Data

Barberi, Leonardo Aldo Alejandro, De Cave, Linda Maria

arXiv.org Artificial Intelligence

This paper introduces advanced techniques of Topological Data Analysis (TDA) for unsupervised anomaly detection and customer segmentation in banking data. Using the Mapper algorithm and persistent homology, we develop unsupervised procedures that uncover meaningful patterns in customers' banking data by exploiting topological information. The framework we present in this paper yields actionable insights that combine the abstract mathematical subject of topology with real-life use cases that are useful in industry.



Inverse Transition Learning: Learning Dynamics from Demonstrations

Benac, Leo, Sharma, Abhishek, Parbhoo, Sonali, Doshi-Velez, Finale

arXiv.org Machine Learning

We consider the problem of estimating the transition dynamics $T^*$ from near-optimal expert trajectories in the context of offline model-based reinforcement learning. We develop a novel constraint-based method, Inverse Transition Learning, that treats the limited coverage of the expert trajectories as a \emph{feature}: we use the fact that the expert is near-optimal to inform our estimate of $T^*$. We integrate our constraints into a Bayesian approach. Across both synthetic environments and real healthcare scenarios like Intensive Care Unit (ICU) patient management in hypotension, we demonstrate not only significant improvements in decision-making, but that our posterior can inform when transfer will be successful.


TartuNLP at EvaLatin 2024: Emotion Polarity Detection

Dorkin, Aleksei, Sirts, Kairit

arXiv.org Artificial Intelligence

This paper presents the TartuNLP team submission to EvaLatin 2024 shared task of the emotion polarity detection for historical Latin texts. Our system relies on two distinct approaches to annotating training data for supervised learning: 1) creating heuristics-based labels by adopting the polarity lexicon provided by the organizers and 2) generating labels with GPT4. We employed parameter efficient fine-tuning using the adapters framework and experimented with both monolingual and cross-lingual knowledge transfer for training language and task adapters. Our submission with the LLM-generated labels achieved the overall first place in the emotion polarity detection task. Our results show that LLM-based annotations show promising results on texts in Latin.


Newell's theory based feature transformations for spatio-temporal traffic prediction

Sengupta, Agnimitra, Guler, S. Ilgin

arXiv.org Artificial Intelligence

Deep learning (DL) models for spatio-temporal traffic flow forecasting employ convolutional or graph-convolutional filters along with recurrent neural networks to capture spatial and temporal dependencies in traffic data. These models, such as CNN-LSTM, utilize traffic flows from neighboring detector stations to predict flows at a specific location of interest. However, these models are limited in their ability to capture the broader dynamics of the traffic system, as they primarily learn features specific to the detector configuration and traffic characteristics at the target location. Hence, the transferability of these models to different locations becomes challenging, particularly when data is unavailable at the new location for model training. To address this limitation, we propose a traffic flow physics-based feature transformation for spatio-temporal DL models. This transformation incorporates Newell's uncongested and congested-state estimators of traffic flows at the target locations, enabling the models to learn broader dynamics of the system. Our methodology is empirically validated using traffic data from two different locations. The results demonstrate that the proposed feature transformation improves the models' performance in predicting traffic flows over different prediction horizons, as indicated by better goodness-of-fit statistics. An important advantage of our framework is its ability to be transferred to new locations where data is unavailable. This is achieved by appropriately accounting for spatial dependencies based on station distances and various traffic parameters. In contrast, regular DL models are not easily transferable as their inputs remain fixed. It should be noted that due to data limitations, we were unable to perform spatial sensitivity analysis, which calls for further research using simulated data.


Informed Priors for Knowledge Integration in Trajectory Prediction

Schlauch, Christian, Klein, Nadja, Wirth, Christian

arXiv.org Artificial Intelligence

Informed machine learning methods allow the integration of prior knowledge into learning systems. This can increase accuracy and robustness or reduce data needs. However, existing methods often assume hard constraining knowledge, that does not require to trade-off prior knowledge with observations, but can be used to directly reduce the problem space. Other approaches use specific, architectural changes as representation of prior knowledge, limiting applicability. We propose an informed machine learning method, based on continual learning. This allows the integration of arbitrary, prior knowledge, potentially from multiple sources, and does not require specific architectures. Furthermore, our approach enables probabilistic and multi-modal predictions, that can improve predictive accuracy and robustness. We exemplify our approach by applying it to a state-of-the-art trajectory predictor for autonomous driving. This domain is especially dependent on informed learning approaches, as it is subject to an overwhelming large variety of possible environments and very rare events, while requiring robust and accurate predictions. We evaluate our model on a commonly used benchmark dataset, only using data already available in a conventional setup. We show that our method outperforms both non-informed and informed learning methods, that are often used in the literature. Furthermore, we are able to compete with a conventional baseline, even using half as many observation examples.


Forecasting new diseases in low-data settings using transfer learning

Roster, Kirstin, Connaughton, Colm, Rodrigues, Francisco A.

arXiv.org Artificial Intelligence

Recent infectious disease outbreaks, such as the COVID-19 pandemic and the Zika epidemic in Brazil, have demonstrated both the importance and difficulty of accurately forecasting novel infectious diseases. When new diseases first emerge, we have little knowledge of the transmission process, the level and duration of immunity to reinfection, or other parameters required to build realistic epidemiological models. Time series forecasts and machine learning, while less reliant on assumptions about the disease, require large amounts of data that are also not available in early stages of an outbreak. In this study, we examine how knowledge of related diseases can help make predictions of new diseases in data-scarce environments using transfer learning. We implement both an empirical and a theoretical approach. Using empirical data from Brazil, we compare how well different machine learning models transfer knowledge between two different disease pairs: (i) dengue and Zika, and (ii) influenza and COVID-19. In the theoretical analysis, we generate data using different transmission and recovery rates with an SIR compartmental model, and then compare the effectiveness of different transfer learning methods. We find that transfer learning offers the potential to improve predictions, even beyond a model based on data from the target disease, though the appropriate source disease must be chosen carefully. While imperfect, these models offer an additional input for decision makers during pandemic response.


Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Cogswell, Michael, Lu, Jiasen, Jain, Rishabh, Lee, Stefan, Parikh, Devi, Batra, Dhruv

arXiv.org Artificial Intelligence

Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people? Such agents could leverage a larger variety of existing data to generalize to new tasks, minimizing expensive data collection and annotation. In this work, we study a setting we call "Dialog without Dialog", which requires agents to develop visually grounded dialog models that can adapt to new tasks without language level supervision. By factorizing intention and language, our model minimizes linguistic drift after fine-tuning for new tasks. We present qualitative results, automated metrics, and human studies that all show our model can adapt to new tasks and maintain language quality. Baselines either fail to perform well at new tasks or experience language drift, becoming unintelligible to humans. Code has been made available at https://github.com/mcogswell/dialog_without_dialog


Data-Driven Compression of Convolutional Neural Networks

Pahwa, Ramit, Arivazhagan, Manoj Ghuhan, Garg, Ankur, Krishnamoorthy, Siddarth, Saxena, Rohit, Choudhary, Sunav

arXiv.org Machine Learning

Deploying trained convolutional neural networks (CNNs) to mobile devices is a challenging task because of the simultaneous requirements of the deployed model to be fast, lightweight and accurate. Designing and training a CNN architecture that does well on all three metrics is highly non-trivial and can be very time-consuming if done by hand. One way to solve this problem is to compress the trained CNN models before deploying to mobile devices. This work asks and answers three questions on compressing CNN models automatically: a) How to control the trade-off between speed, memory and accuracy during model compression? b) In practice, a deployed model may not see all classes and/or may not need to produce all class labels. Can this fact be used to improve the trade-off? c) How to scale the compression algorithm to execute within a reasonable amount of time for many deployments? The paper demonstrates that a model compression algorithm utilizing reinforcement learning with architecture search and knowledge distillation can answer these questions in the affirmative. Experimental results are provided for current state-of-the-art CNN model families for image feature extraction like VGG and ResNet with CIFAR datasets.